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Abstract 

Online Passive-Aggressive (PA) learning is a class of online margin-based algo¬ 
rithms suitable for a wide range of real-time prediction tasks, including classifica¬ 
tion and regression. PA algorithms are formulated in terms of deterministic point- 
estimation problems governed by a set of user-defined hyperparameters: the ap¬ 
proach fails to capture model/prediction uncertainty and makes their performance 
highly sensitive to hyperparameter configurations. In this paper, we introduce a 
novel PA learning framework for regression that overcomes the above limitations. 
We contribute a Bayesian state-space interpretation of PA regression, along with 
a novel online variational inference scheme, that not only produces probabilis¬ 
tic predictions, but also offers the benefit of automatic hyperparameter tuning. 
Experiments with various real-world data sets show that our approach performs 
significantly better than a more standard, linear Gaussian state-space model. 


1 Introduction 

Online learning is the most common approach of learning from non-stationary and/or large se¬ 
quential data sets. In online learning, model parameters are learned in a sequential manner, thus 
achieving temporal adaptation and learning efficiency in time-aware applications. Among the pop¬ 
ular algorithms, online Passive-Aggressive (PA) learning HI provides a generic family of online 
margin-based algorithms for various time-aware applications, including classification and regres¬ 
sion. However, despite their merits, PA algorithms make point rather than probabilistic predictions, 
and depend on a set of hyperparameters that are assumed to be user-defined and constant over time. 
This assumption is impractical for at least two reasons. Eirst, it has been recently argued that the 
performance of many machine learning algorithms is highly sensitive to hyperparameter settings 
0, and PA learning is unlikely to be an exception because its performance is measured in terms 
of cumulative loss. Second, in non-stationary environments, optimal hyperparameter choices may 
quickly become sub-optimal, due to the evolving nature of the underlying population distributions. 

To address these drawbacks, we propose a new online PA method based on a Bayesian treatment of 
the existing PA framework. We concentrate here on PA learning for regression. Our algorithm incor¬ 
porates a novel, online, variational inference scheme. Eurthermore, it explicitly takes into account 
uncertainty in our predictions and is endowed with a self-tuning hyperparameter mechanism. 

The main contributions of the paper are twofold. Eirstly, this paper is, to the best of our knowledge, 
the first to approach online PA regression from a Bayesian state-space perspective. We will indeed 
show that the state-space representation of PA regression results in a Bayesian linear Gaussian state- 
space model (LGSSM). Secondly, we establish a clear connection between our online variational 
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inference procedure and Streaming Variational Bayes El, thus making the first application of the 
latter to the Bayesian LGSSM setting. 

2 Bayesian State-Space Approach to Passive-Aggressive Regression 

In this section, we provide a Bayesian treatment of online PA regression within a state-space frame¬ 
work. We show that the state-space model (SSM) corresponding to PA regression is, conditionally 
upon the mean and variance of the measurement noise, a special case of the Bayesian LGSSM, and 
that it justifies the PA regression algorithm from a maximum a posteriori (MAP) standpoint. 

2.1 Online Passive-Aggressive Regression 

Consider a data stream consisting of examples {{^t,yt)}t>v where x G is an /-dimensional 
input vector and y G K is the associated output. Online fik regression HI is based on the linear 
prediction model of the form /(x) = x^w, where w G is the incrementally learned weight 
vector. The PA regression algorithm initialises the weight vector to the zero vector (wi = 0/xi) 
and, after observing the f* example, the new weight is obtained as the solution tqM 

mn ||wt - wt_i|| 2 + 0/(yt,x7wt;e)| , (1) 

where i{y,y;e) = \y — y\e = max(|y — y\ — e,0) is the e-insensitive loss function (e-ILF) and 
C > 0 is a user-specified parameter. The intuitive goal of PA regression is to minimally change the 
existing weight estimate while predicting the example as accurately as possible. The parameter 
C serves to balance these two competing objectives. Larger values of C imply a more aggressive 
update step, whence the name of aggressiveness parameter m. 

2.2 Bayesian Linear Gaussian State-Space Models 

LGSSM^are fundamental in time-series analysis |l4l|5l. In these models, each output yt is generated 
from an underlying dynamical system on the hidden variable hj according to: 

yt = b^ht + ? 7 t, 77t ~ A/'(?7t|0,cr^) , ht = Aht-i + , r]^Af , (2) 

where H = dim(ht). The initial latent variable also has a Gaussian distribution which we write as 
p(hi) = A/^(hi|/r^, S,r)- The model parameters are therefore 9 = (A, b, S, cr^, Sjr)- In the 
Bayesian treatment of the LGSSM, instead of considering 9 as fixed, we define a prior distribution 
p{9\uj), where u; is a vector of hyperparameters. 

2.3 Bayesian State-Space Representation of Passive-Aggressive Regression 

Let 1/ be the identity matrix of order /. The state-space representation of PA regression is given by 
J/t =x7wt-f pt, ?7t~p(%|e), Wt = Wt_i-I-T77, t]Y ^, (3) 
with the convention that wq = O/xi, and where 

pfak) = + 

is the measurement-noise density dictated by the e-ILF |6l. In this case, the weight posterior satis¬ 
fied 

p(wt|yt,wt_i) cxp(yt|wt)p(wt|wt_i) = (wt|wt-i, . (5) 

Setting (a, wt_i) = Wt_i) in the above equation, taking the negative logarithm thereof and 

ignoring any resulting additive constant yields the PA objective from Q. We thus obtain a MAP 
justification for the PA regression algorithm. 

* We restrict our attention to the PA-I variant of PA regression. 

^These are also called Kalman Filters/Smoothers and Linear Dynamical Systems. 

^For brevity, we omit Xt from the conditioning statements, and shall do so in the remainder of the paper. 
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Observe that Eqs. 0- ffl give a model that is intractable, due to the Laplacian-like noise distribu¬ 
tion. Having said that, Ul proved that this distribution can be expressed as a continuous mixture of 
Gaussians (CMoG). Specific alljj^, 

P(.Vt\e) = j j JV {r|t\^i,/3~^)p {n\e)p (13) d^idp, (6) 


with 

p(^)=ig(/3|l,l/2) = ir'e-5? (7) 

p (p|e) =U{p\- e, e) = ^ [if-e.d (p) + S {p + e) + S {p - e)] , (8) 

where XQ stands for ‘inverse Gamma’, ILs(-) for the indicator function of the set S, and 5{-) for 
the Dirac delta function. The above CMoG formulation implies that, conditionally upon /3 and p, 
the SSM described by Eqs. ([^-Q is a special case of the Bayesian LGSSM from (|^. To retain 
this formalism, we will, in the first instance, hold /3 and p ‘fixed’. In the second instance, we will 
approximately marginalise /3 and p by means of an innovative, truly sequential. Variational Bayes 
(VB) routine. 

Going forward, we shall refer to the ensuing model as BaYesian Passive-Aggressive State-Space 
Model, or BYPASS for short. BYPASS additionally takes the prior over its parameter vector 9 = 
{a, /3, p) to factorise as 


p{e\uj)=g{a\a,b)Xg{l3\l,l/2)U{p\-e,e), uj = {a,b,e). (9) 

Note that we have assigned the standard conjugate prior to the weight precision a. We do not define 
any prior for Probabilistically, the BYPASS model is defined b}j^ 


P Wi:t, 9\uj) = p {yi,t, Wi:t\9) P {e\uj) 


Y[p{yr\'Wr,P,l3)p {Wr |w^_i, a) 

T — 1 


p{e\uj) , 


( 10 ) 

where p(?/^|w^,/r,/3) = Af {yr\xjwrp, P and p(w^|w^_i, a) = A/'(w^|w^_i, a ^I/). 


3 Genuinely Online Variational Inference 

An exact implementation of Bayesian LGSSMs is formally intractable 11. Besides sampling 
methods Elllol, VB approximations iiniiia are popular approximate treatments in this context. 
Nonetheless, the drawback of such VB procedures is that they all require a full pass through the 
data at each iteration, rendering them impracticable for streaming data. To remedy this, we develop 
Genuinely Online Variational Inference (GOVI), a novel framework whereby VB may be efficiently 
deployed in the streaming setting, without the need to revisit past data or have advance knowledge of 
future data. The rationale behind GOVI is to store the joint BYPASS distribution learned on round 
f — 1 so as to recycle it in the subsequent round. This simple principle is reflected by the following 
probabilistic recursions: 

t-i 

p {yi,t-l,Wi,t-l\{9)i,t-i) = ]Jp(2/r|w^, {p)r,{P)r)p{'^T\'/Vr-l,{a)r) , (H) 

Piyi-.t,'IVl:t\9, {9)i,t_i) =p(t/t|wt,p,/3)p(wt|w4_i,a)p(?/i:t_i,wi.t_i|(0)i.t_i), (12) 

where {6)t = {9)qpgp (pdix) denotes the expectation w.r.t. the distribution d{x), and qt{-) is a 
shorthand for the approximating density q{-\yi:t, {9)i-.t-i)- A crucial implication of this recycling 
process is that we may discard observations after processing them. As a result, GOVI is both single¬ 
pass and computationally efficient, thereby achieving the desiderata of streaming methods M- 

‘'We use a condensed integral notation: all integrals are definite integrals over the entire domain of interest, 
fully Bayesian treatment certainly requires the specification of a hyperprior, but is not taken here for 
space restrictions. 

^Vi:t denotes vi,... ,Vt. 
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To determine one considers the lower bound; 

logp{yi,t\{9)i,t-i,u;) > {Et (wi:t, + (logp (0|a;))g^(e) + H {qt) = £, (13) 

where Et{wi.t,9) = \ogp{yi:t,'Wi.t\9, {9)i.t-i) and H{d) signifies the entropy of d{x). The 
key approximation in VB, commonly called the mean-held approximation (MFA), is qti'Wi-t, 9) = 
qt{'^i:t) Hi dtifii), from which one may show that, for optimality of C, 

qt (wi:t) oc q{yi-t, Wi:t) = , qt (9) oc p {9\u}) . (14) 

These coupled equations need to be iterated to convergence. Our main concern is with the update 
for gt(wt), for which this paper makes a departure from treatments previously developed ifTTlfT^ . 
We will present hnal results only, and refer the reader to the Supplementary Material for detailed 
derivations. 

3.1 Approximate Filtering 

From Eqs. (fTT])-<fT^, it follows that 

t 

9(yi:i,Wi:t) = TT A/'(yr|xJw^ + (/r).r, (/3)r X A/" (w^ | W^_i, (a) ^I/) . (15) 

r=l '-^'-''-' 

=<j(yT|wT)wp(yT|wT,/i,/S) =(3(wT|wT_i)Rip(w.r|wT-i,a) 

Clearly, the above represents the joint distribution of the BYPASS model with sequentially updated, 
averaged parameters. Thus, inference can be performed using the standard Kalman hlter (KF) equa¬ 
tions iniiB]. A direct consequence is that the approximate hltering distribution is Gaussian; 

9t (wt) = A/'(wt|/x7,S7). (16) 

The moments of this distribution are iteratively updated as described in Algorithm[T] 

3.2 Mean Variational Parameters 
Update for a 

The approximate posterior over the weight precision is a Gamma distribution whose mean can be 
found from the following fixed-point iteration; 

(a)^™ = --(17) 

2h+||/xr-Mr-ill^+tr(sr-sr-i) 


Update for /3 

The variational posterior of /? is a generalised inverse Gaussian distribution dehned by 

qt{l3)=gig{l3\-l,l,pt), Pt = (j/t - x7/rr - Wt)^ -h Ut'", (18) 

where Vl^ denotes the variance of p, under qt- The corresponding update equation is therefore 


{P)r = 


.^0 (\/fh) 


(19) 


(i/a) 

where denotes the modified Bessel function of the second kind, with index v. 

Update for p 

The approximating density for p is somewhat intractable and non-standard, but is roughly equal to 
a truncated Gaussian with lower and upper truncation values of — e and e, respectively, so we set 

..W / ,Q\ — 


qt (p) = (ply* - x7pr, (/3)r') ■ 

From this, we obtain the following fixed-point equation in (p)^; 

T.w , 


(p)r = yt- x; pr + 


,mf[<^{ut)-<^{it)]' 


( 20 ) 


( 21 ) 
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where 


( 22 ) 


k = '/Wh [-e - {yt - x7 nT)] , ut = y/Wit [e “ {vt - x7 fJ-T)] , 
while and $(•) denote the PDF and CDF of a standard Gaussian, respectively. Similarly, 


/ ^ \ new 1 

^ - Ut(l){ut) 


V ‘ J (/3)?“ 


-1 

1 

3 


(23) 


3.3 Relation to Streaming Variational Bayes 

In this section, we argue that GOVI falls under a broader family of online VB algorithms known as 
Streaming Variational Bayes (SVB) El. Note that Bayes’ rule can be written in a streaming form: 

p{Q\yv.t) cxp(j/t|0)p(0|t/i:t-i), (24) 

where 0 represents a set of stochastic parameters. SVB suggests that, when the above is infeasible 
to compute, one should adopt an approximation algorithm A such that 

p (0|2/i:t) « qt (0) = A (t/t, gt-i (0)), (25) 

with (7o(0) = p(0)- When A generates the posterior from Bayes’ theorem, this calculation is 
exact. In the setting of BYPASS, 0 = {wt, 9} and, by MFA, we obtain two separate approximation 
algorithms, namely 

gt (w() = yfw (wt)) and qt (9) = Ae iyt,P , (26) 

the latter having to ineluctably rely on a time-invariant prior over 9, as the BYPASS framework does 
not specify any dynamics in that regard. More precisely, we have 

qt{vft) o^Af {yt\xjwt +{p)t,{P)i^) j A/" (wt|wt_i, gt_i (w(_i) dwt_i (27) 

= IJt-l(wt) 

qt {9) cx qt {yt\9) p {9\u:), qt {yt\9) = exp | (logp (t/t|wt, p (w* |wt_i, |- 

(28) 

Interestingly, from Eq. we are able to recover the KF equations evaluated at the mean vari¬ 
ational parameters. The aforementioned digression from treatments previously presented thus em¬ 
anates from the fact that we make the first application of SVB to the Bayesian LGSSM setting. 


4 Learning the hyperparameters: Adaptive BYPASS 


As far as variational inference in Bayesian LGSSMs is concerned, the optimal hyperparameter val¬ 
ues are typically obtained by optimising the variational lower bound C w.r.t. to uj iniiia. However, 
this would not be computationally viable in a streaming environment. Since we are not treating u} as 
a random vector, we may readily apply the PA regression framework from Section 2.1 to automati¬ 


cally tune uj in an online manner. To mimic the ML-II (‘evidence’) framework, we use the negative 
log likelihood of the BYPASS model as the underlying loss function. This gives rise to the following 
optimisation problem: 


u)t = argmin 
uj>Omxi 


1II 
- ||a; 


+ {yt - fJ’T-i - , 


(29) 


where M = dim(a;). We remark that, by construction, this problem corresponds to sequential 
maximum likelihood at the hyperparameter level. Its objective function depends on uj, insofar as 
the latter is employed to determine the weight estimates piY-i- To convert this problem into a more 
‘conventional’ one, we replace the strict-positivity constraints uj > Omxi by u; > where 

‘^min ~ Omxi represents a lower bound on uj. We consequently get (see Supplementary Material) 




max 




+ C^{l3)t-iJcJiJt-i {yt-xjfiY-i - (M)t-i) Imxi, 



(30) 
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where the max operator is taken element-wise, Idxi denotes a Z3-dimensional vector of ones and, 
for each oj G u}, denotes the gradient of w.r.t. to evaluated at w = wj. As demonstrated in the 
Supplementary Material, this gradient is updated in an iterative fashion, based on its previous value 
and St, the gradient of w.r.t. w evaluated at Wf We dubbed the ensuing algorithm adaptive BY¬ 
PASS (ADA-BYPASS). The implementation details of the latter and of its non-adaptive counterpart 
are outlined in Algorithms]^ and [T] respectively. 


Algorithm 1 BYPASS 

1: Input: Hyperparameters us, initial mean variational parameters (0)o- 
2: Set /i.^ = 0/xi and Sq = O/x/- 
3: for f = 1, 2,... do 
4: Obtain new inputs x^. 

5: Compute the predictive mean and variance of the output: 

m = x7 (xY-i + (M)t-i, Vt = xj PT-i^t + (/3)r-\- 

6: Derive the new mean variational parameters {6)t by repeating the fixed-point iterations ( fTTl l, 

( [T9| i, ( |2T] i and ( |2^ until convergence. 

7: Evaluate the predictive weight covariance and the Kalman gain: 

PT-i = sr.i + gt = i^jPT-i^t + pr-ix*. 

8: Update the mean and covariance of the approximate filtering distribution qt{'Wt): 

pT = + g, {y, - x7 pY-i - {B)t ), sr = (I, - g,x7) PY_,. 

9: end for 


Algorithm 2 ADA-BYPASS: BYPASS with hyperparameter adaptation via PA regression. 

1: Input: Initial hyperparameters lSq, lower hyperparameter bounds initial mean variational 
parameters {9)o, initial variational variance Vq, aggressiveness parameter Cui > 0. 

2: Same as Step 2 in Algorithm[T] 

3: Initialise the gradients w.r.t. uj G ur. xpQ = 0/xi, Sq = I/. 

4: for t = 1,2, ... do 

5: Same as Steps 4-5 in Algorithm]^ 

6: Update the hyperparameters according to Eq. @. 

7: Same as Steps 6-8 in Algorithmic 

8: Update the gradients: 

St = (1/ - gtxj ) St-i (1/ - Xtg7) , 

V’t = (1/ - St^J ) ^t-i + (^)tStXt {yt - x7 fJ-T-i - {p)t) ■ 

9: end for 


5 Applications 

5.1 Practicalities 

Based on the sensitivity analysis in fll, we set the aggressiveness parameter equal to 10“^. The 
model parameters are initialised at their prior means, except for the output precision /3, whose prior 
mean is undefined. A similar principle is applied to the variational variance of y. As a result of this, 
we obtain: {a)^ = a/b, {p,)o = 0 and Vq = Ya.i[U{p.\ — e, e)] = e^(l -f e/3)/(l -f e). As for/?, 
we approximate its prior mean as follows: (/3)o « 0.5/10“^ = 500. 

Next, we choose initial values for the h yperparameters. In order to initially emulate the frequentist 
PA regression framework (Section [2T| while simultaneously making p{a\a, h) ‘uninformative’ (i.e. 
broad), we set a = C~^ and 6 = 1. As for the insensitivity hyperparameter, we use e = 1.25, this 
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value being the mean of a symmetric Beta distribution of the second kincQ with shape parameter 
s = 5, the choice of which was motivated by na. Finally, we selected Wmin = 10 ®, Vwmin G ‘^min- 

5.2 Model specification and benchmark 

In the following experiments, unless otherwise stated, we used an autoregressive measurement equa¬ 
tion of order 1 (AR(1)): yt = Wtfl + Wt^iyt-i + Vt, where wtfi is a bias parameter. While this is 
perhaps not the best specification, feature selection goes beyond the scope of the present study. It is 
worthwhile noting, however, that there is no theoretical or practical obstacle that would prevent us 
from considering more complex predictors. This would be expected to further improve the model’s 
performance. 

We make comparisons with a standard LGSSM in which a MAP recursion is used to govern the 
adaptation of the model parameters, by sequentially using the maximum-likelihood formulation first 
proposed by Oil. To ensure full comparability of results, we also endow this model with an AR(1) 
hypothesis, and refer to it as sequential Kalman filter (SKF) in the applications below. 

In both models, one-step ahead forecasts are successively iterated to provide multi-step forecasts of 
arbitrary length, as needed. Missing values, if they occur, are accommodated for using the scheme 
advocated by a, in which they are replaced by their expectations under the corresponding model. 

5.3 Nile data 

We hrst consider a canonical changepoint data set, the minimum water levels of the Nile river during 
the period AD 622-1284 IM- Several authors have found evidence supporting a changepoint for 
these data around AD 720-722 ifT^ [T9l l20l . The conjectured reason for this changepoint is the 
construction in AD 715 of a new device (a ‘nilometer’) on the island of Roda, which affected the 
nature and accuracy of the measurements. 

We performed one-year lookahead prediction on this data set. The results can be seen in Fig. We 
note the superior performance of ADA-BYPASS compared with the SKF. 



TOO 800 900 iooo UOO 1200 
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Metric 

ADA-BYPASS 

SKE 

RMSE (cm) 

0.72 

0.85 

MAD (cm) 

0.42 

0.45 

MAE (cm) 

0.54 

0.62 

LL 

-754.2 

-971.78 


Figure 1; Online one-year ahead predictions for the Nile’s minimum water levels. Left panel; ob¬ 
served levels (black diamonds), predicted levels (red line) and ±1 standard deviation error bars 
(pink area). Right panel: predictive performances; error metrics shown are root mean squared error 
(RMSE), mean absolute deviation (MAD), mean absolute error (MAE) and predictive log likelihood 
(LL). 


5.4 Wind speed data 

To demonstrate the superior performance of ADA-BYPASS on a large data set, we next present the 
series of anemometer wind speed measurements (in m/s) from a Danish wind turbine. The data were 
sampled at 10 minute intervals for just over nine months, resulting in a total of 40,174 measurements. 
The 10 minute lookahead predictive performance achieved by each method is reported in Table 

’We show in the Supplementary Material that the form of p(rit\e) induces this prior for e. 


7 














Table 1: Predictive performance of ADA-BYPASS vs SKF on the wind speed data set. 


Metric 

ADA-BYPASS 

SKF 

RMSE (m/s) 

0.6 

0.64 

MAD (m/s) 

0.3 

0.31 

MAE (m/s) 

0.42 

0.44 

FF 

-24,971.75 

-30,140.04 


5.5 Statistical Arbitrage 

LGSSMs, and variants thereof, have seen a widespread use in statistical arbitrage strategies, notably 
in pairs trading 112111^12^ . In this area, they serve as a dynamic model for the price spread between 
two assets. In our application, we seek to find the hedge raf/c|^and the predictive standard deviation 
of the spread. The observable variable is thus one of the price series y, and the hidden variable is 
the hedge ratio w. We assume that both variables obey the ADA-BYPASS dynamics, i.e. 

yt=WtXt+rjt, rjt ^, Wt = Wt-i + (t, Ct ~(Ct|0, , (31) 

where x is the price series of the other asset. Typically, a, /3 and y are manually selected in hindsight 
ED. However, this practice is highly prone to the so-called data-snooping bias: these parameters 
can be tweaked so as to optimise the backtesting performance of the strategy. The ADA-BYPASS 
algorithm automatically tunes its underlying parameters, so it does not suffer from this caveat. 

We tested ADA-BYPASS on a pair of exchange-traded funds (ETFs) consisting of the SPDR gold 
trust GLD and the gold-miners ETF GDX. This ETF pairing is a favourite in the financial industry, 
because the value of gold-mining companies is very much based on the value of gold. We down¬ 
loaded the corresponding, daily adjusted closing prices from Yahoo! Finance, between 22/05/2006 
and 22/04/2015. 

Rather than maximising profits, most investors attempt to maximise risk-adjusted returns, as advo¬ 
cated by modem portfolio theory. The Sharpe ratio is the most widely used measure of risk-adjusted 
returns El- Besides the Sharpe ratio, the maximum drawdown and maximum drawdown duration 
are two other popular metrics to evaluate trading strategies. From Table we can clearly discern 
that ADA-BYPASS beats SKF by a significant margin in terms of the aforementioned performance 
metrics. 


Table 2: Performance of the GDX-GFD pairs trade under ADA-BYPASS and SKF. 


Metric 

ADA-BYPASS 

SKF 

Sharpe ratio 

1.12 

0.7 

Maximum drawdown (%) 

14.61 

73.05 

Maximum drawdown duration (trading days) 

375 

567 


6 Concluding remarks 

We introduced the first online Bayesian PA regression model within the state-space setting, along 
with a novel, online variational inference algorithm. This model is ideal for the probabilistic predic¬ 
tion of non-stationary and/or very large time series, in particular massive, time-varying data streams. 
Results on three real-world data sets show significant improvements in predictive performance over 
a more standard FGSSM. 


*The hedge ratio of a particular asset is the number of units of that asset we should buy or sell in a portfolio. 
If the asset is a stock, then the number of units corresponds to the number of shares. A negative hedge ratio 
indicates we should sell that asset. 
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